Cheap Data Analytics using Cold Storage Devices

نویسندگان

  • Renata Borovica
  • Raja Appuswamy
  • Anastasia Ailamaki
چکیده

Enterprise databases use storage tiering to lower capital and operational expenses. In such a setting, data waterfalls from an SSDbased high-performance tier when it is “hot” (frequently accessed) to a disk-based capacity tier and finally to a tape-based archival tier when “cold” (rarely accessed). To address the unprecedented growth in the amount of cold data, hardware vendors introduced new devices named Cold Storage Devices (CSD) explicitly targeted at cold data workloads. With access latencies in tens of seconds and cost/GB as low as $0.01/GB/month, CSD provide a middle ground between the low-latency (ms), high-cost, HDD-based capacity tier, and high-latency (min to h), low-cost, tape-based, archival tier. Driven by the price/performance aspect of CSD, this paper makes a case for using CSD as a replacement for both capacity and archival tiers of enterprise databases. Although CSD offer major cost savings, we show that current database systems can suffer from severe performance drop when CSD are used as a replacement for HDD due to the mismatch between design assumptions made by the query execution engine and actual storage characteristics of the CSD. We then build a CSD-driven query execution framework, called Skipper, that modifies both the database execution engine and CSD scheduling algorithms to be aware of each other. Using results from our implementation of the architecture based on PostgreSQL and OpenStack Swift, we show that Skipper is capable of completely masking the high latency overhead of CSD, thereby opening up CSD for wider adoption as a storage tier for cheap data analytics over cold data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Big Data Analytics in Power Distribution Network

Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...

متن کامل

Big Data Analytics on Object Stores: A Performance Study

Object stores provide a highly scalable and cheap storage solution due to their key-value store semantics and commodity-hardware based deployment. This makes them an attractive option for archiving large amounts of data that are produced in science and industry. To analyze that data, advanced analytics such as MapReduce can be used. However, copying the data from the object store into the distr...

متن کامل

Fog-Assisted wIoT: A Smart Fog Gateway for End-to-End Analytics in Wearable Internet of Things

Today, wearable internet-of-things (wIoT) devices continuously flood the cloud data centers at an enormous rate. This increases a demand to deploy an edge infrastructure for computing, intelligence, and storage close to the users. The emerging paradigm of fog computing could play an important role to make wIoT more efficient and affordable. Fog computing is known as the cloud on the ground. Thi...

متن کامل

Big Data Management and Analytics for Mobile Crowd Sensing

With the fast increasing popularity of mobile smart devices, mobile crowd sensing has become a new paradigm of applications that enables the ubiquitous mobile devices with enhanced sensing capabilities, such as smartphones and wearable devices, to collect and to share local information towards a common goal. Most of the smart devices are equipped with a rich set of cheap and powerful sensors, f...

متن کامل

cniCloud: erying the Cellular Network Information at Scale

This paper presents cniCloud, a cloud platform for mobile devices to share and query the ne-grained cellular information at scale. cniCloud extends the single-device cellular analytics via crowdsourcing: It collects the ne-grained cellular network data from massive mobile devices, aggregates them in a cloud database, and provides interfaces for end users to run SQL-like query over the cellular ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2016